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Abstract — A well-known analysis of Tropp and Gilbert shows 
that orthogonal matching pursuit (OMP) can recover a k- 
sparse rt-dimensional real vector from m = 4fclog(n) noise- 
free linear measurements obtained through a random Gaussian 
measurement matrix with a probability that approaches one 
as n — > oo. This work strengthens this result by showing 
that a lower number of measurements, m = 2fclog(n — k), 
is in fact sufficient for asymptotic recovery. More generally, 
when the sparsity level satisfies fc m i n < k < fc max but is 
unknown, m = 2fc max log(n — fc m m) measurements is sufficient. 
Furthermore, this number of measurements is also sufficient for 
detection of the sparsity pattern (support) of the vector with 
measurement errors provided the signal-to-noise ratio (SNR) 
scales to infinity. The scaling m — 2fclog(n — k) exactly matches 
the number of measurements required by the more complex lasso 
method for signal recovery with a similar SNR scaling. 

Index Terms — compressed sensing, detection, lasso, orthogo- 
nal matching pursuit, random matrices, sparse approximation, 
sparsity, subset selection 

I. Introduction 

Suppose x € K™ is a sparse vector, meaning its number 
of nonzero entries k is smaller than n. The support of x is 
the locations of the nonzero entries and is sometimes called 
its sparsity pattern. A common sparse estimation problem is 
to infer the sparsity pattern of x from linear measurements of 
the form 

y = Ax + w, (1) 

where A g ^jnxn j s a ]j nown measurement matrix, y g R m 
represents a vector of measurements and w e W n is a vector 
of measurement errors (noise). 

Sparsity pattern detection and related sparse estimation 
problems are classical problems in nonlinear signal processing 
and arise in a variety of applications including wavelet-based 
image processing [ 1 1 and statistical model selection in linear 
regression [2 |. There has also been considerable recent interest 
in sparsity pattern detection in the context of compressed 
sensing, which focuses on large random measurement matrices 
A (3"| — [ 5 1 . It is this scenario with random measurements that 
will be analyzed here. 

Optimal subset recovery is NP-hard [6 1 and usually involves 
searches over all the (?) possible support sets of x. Thus, most 
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attention has focused on approximate methods. One simple and 
popular approximate algorithm is orthogonal matching pursuit 
(OMP) OMP is a greedy method that identifies the 

location of one nonzero entry of x at a time. A version of 
the algorithm will be described in detail below in Section [II] 
The best known analysis of the detection performance of OMP 
for large random matrices is due to Tropp and Gilbert [10], 
ifTTl . Among other results, Tropp and Gilbert show that when 
A has i.i.d. Gaussian entries, the measurements are noise-free 
(w = 0), and the number of measurements scales as 



m > (1 + <5)4fclog(n) 



(2) 



for some S > 0, the OMP method will recover the correct 
sparse pattern of x with a probability that approaches one as 
n and k — > oo. The analysis uses a deterministic sufficient 
condition for success on the matrix A based on a greedy 
selection ratio introduced in lfl2l . A similar deterministic 
condition on A was presented in [13], and a condition using 
the restricted isometry property was given in [1141 . 

Numerical experiments reported in ifTol suggest that a 
smaller number of measurements than (|2} may be sufficient for 
asymptotic recovery with OMP. Specifically, the experiments 
suggest that the constant 4 can be reduced to 2. 

Our main result, Theorem Q] below, does a bit better than 
proving this conjecture. We show that the scaling in measure- 
ments 



m > (l + 5)2fclog(n- k) 



(3) 



is sufficient for asymptotic reliable recovery with OMP pro- 
vided both n — k and k — > oo. Theorem Q] goes further by 
allowing uncertainty in the sparsity level k. 

We also improve upon the Tropp-Gilbert analysis by ac- 
counting for the effect of the noise w. While the Tropp-Gilbert 
analysis requires that the measurements are noise-free, we 
show that the scaling (O is also sufficient when there is noise 
w, provided the signal-to-noise ratio (SNR) goes to infinity. 

The main significance of the new scaling (O is that it 
exactly matches the conditions for sparsity pattern recovery 
using the well-known lasso method. The lasso method, which 
will be described in detail in Section [IV] is based on a 
convex relaxation of the optimal detection problem. The 
best analysis of sparsity pattern recovery with lasso is due 
to Wainwright lfl5l . fl6l . He showed in [15] that under a 
similar high SNR assumption, the scaling ® in number of 
measurements is both necessary and sufficient for asymptotic 
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reliable sparsity pattern detection^] The lasso method is often 
more complex than OMP, but it is widely believed to offset this 
disadvantage with superior performance [10|. Our results show 
that, at least for sparsity pattern recovery under our asymptotic 
assumptions, OMP performs at least as well as lasso|3 Hence, 
the additional complexity of lasso for these problems may not 
be warranted. 

Neither lasso nor OMP is the best known approximate 
algorithm for sparsity pattern recovery. For example, where 
there is no noise in the measurements, the lasso minimization 
([T5T > can be replaced by 

x = argmin ||v|| i, s.t. y = Av. 

A well-known analysis due to Donoho and Tanner fllTl shows 
that, for i.i.d. Gaussian measurement matrices, this minimiza- 
tion will recover the correct vector with 

m x 2fclog(n/m) (4) 

when k <C n. This scaling is fundamentally better than the 
scaling (01 achieved by OMP and lasso. 

There are also several variants of OMP that have shown 
improved performance. The CoSaMP algorithm of Needell 
and Tropp [18| and subspace pursuit algorithm of Dai and 
Milenkovic |fl9l achieve a scaling similar to ©. Other variants 
of OMP include the stagewise OMP l20l and regularized 
OMP ll2~D . Il22ll . Indeed with the recent interest in compressed 
sensing, there is now a wide range of promising algorithms 
available. We do not claim that OMP achieves the best 
performance in any sense. Rather, we simply intend to show 
that both OMP and lasso have similar performance in certain 
scenarios. 

Our proof of Q follows along the same lines as Tropp 
and Gilbert's proof of (|2), but with two key differences. 
First, we account for the effect of the noise by separately 
considering its effect in the "true" subspace and its orthogonal 
complement. Second and more importantly, we address the 
"nasty independence issues" noted by Tropp and Gilbert [ 1 1 
by providing a tighter bound on the maximum correlation of 
the incorrect vectors. Specifically, in each iteration of the OMP 
algorithm, there are n — k possible incorrect vectors that the 
algorithm can choose. Since the algorithm runs for k iterations, 
there are total of k(n — k) possible error events. The Tropp 
and Gilbert proof bounds the probability of these error events 
with a union bound, essentially treating them as statistically 
independent. However, here we show that energies on any one 
of the incorrect vectors across the k iterations are correlated. 
In fact, they are precisely described by samples of a certain 
normalized Brownian motion. Exploiting this correlation we 
show that the tail bound on error probability grows as n — k, 
not k(n — k), independent events. 

'Sufficient conditions under weaker conditions on the SNR are more 
subtle 1161 : the scaling of SNR with n determines the sequences of regu- 
larization parameters for which asymptotic almost sure success is achieved, 
and the regularization parameter sequence affects the sufficient number of 
measurements. 

2 Recall that our result is a sufficient condition for success whereas the 
matching condition for lasso is both necessary and sufficient. 



The outline of the remainder of this paper is as follows. 
Section [TT] describes the OMP algorithm. Our main result, 
Theorem [TJ is stated in Section [TTIJ A comparison to lasso is 
provided in Section IPVl and we suggest some future problems 
in Section IVHI The proof of the main result is somewhat 
long and given in the Section IVIIII The main result was first 
reported in ||231 . 

II. Orthogonal Matching Pursuit 

To describe the algorithm, suppose we wish to determine 
the vector x from a vector y of the form <[TJ. Let 

/true = {j : Xj^O}, (5) 

which is the support of the vector x. The set /true will a lso 
be called the sparsity pattern. Let k — |/truo|> which is the 
number of nonzero entries of x. The OMP algorithm produces 
a sequence of estimates I it), t = 0, 1, 2, . . ., of the sparsity 
pattern /true, adding one index at a time. In the description 
below, let a.j denote the jth column of A. 

Algorithm 1 (Orthogonal Matching Pursuit): Given a vec- 
tor y £ M. m , a measurement matrix A £ R mxn , and threshold 
level (j, > 0, compute an estimate /omp of the sparsity pattern 
of x as follows: 

1) Initialize t = and /(f) = 0. 

2) Compute P(f), the projection operator onto the orthog- 
onal complement of the span of {aj, i £ /(f)}. 

3) For each j, compute 

_ |a-P(t)y| 2 

p ™>- ||P(*)y|| 2 ' 

and let 

[p*(t),i*it)} = .max p(t,j), (6) 

3=1, —,n 

where p*(t) is the value of the maximum and i*(t) is 
an index that achieves the maximum. 

4) If p*(t) > n, set I(t + 1) = /(f) U {i*(t)}. Also, 
increment t = t + 1 and return to step 2. 

5) Otherwise stop. The final estimate of the sparsity pattern 
is /omp = /(f)- 

Note that since P(f) is the projection onto the orthogonal 
complement of the span of {a,, j £ lit)}, for all j £ lit) 
we have P(f)aj = 0. Hence, p(t,j) = for all j £ lit), and 
therefore the algorithm will not select the same vector twice. 

The algorithm above only provides an estimate, /omp, of 
the sparsity pattern of /true- Using /omp, one can estimate 
the vector x in a number of ways. For example, one can take 
the least-squares estimate, 

x = argmin ||y — Av|| 2 (7) 

where the minimization is over all vectors v such Vj = for 
all j £" /omp- The estimate x is the projection of the noisy 
vector y onto the space spanned by the vectors a; with i in 
the sparsity pattern estimate /omp- This paper only analyzes 
the sparsity pattern estimate /omp itself, and not the vector 
estimate x. 
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III. Asymptotic Analysis 

We analyze the OMP algorithm in the previous section 
under the following assumptions. 

Assumption 1: Consider a sequence of sparse recovery 
problems, indexed by the vector dimension n. For each n, 
let x G W 1 be a deterministic vector. Also assume: 
(a) The sparsity level k = k(n) (i.e., number of nonzero 
entries in x) satisfies 



k(n) G [k min (n), k 

max 

(n)] 



(8) 



for some deterministic sequences k m i n (n) and & max (n) 
with A: m i n (n) —> oo as n —> oo and fc max (n) < n/2 for 
all n. 

(b) The number of measurements m = m(n) is a determin- 
istic sequence satisfying 



m > (1 + <5)2/c max log(n - fc min ) 



for some S > 0. 
(c) The minimum component power x^ in satisfies 

lim kx^ in = oo, 

n— too 



where 



mm \Xj 

j€itrue 



(9) 



(10) 



(11) 



is the magnitude of the smallest nonzero entry of x. 
(d) The powers of the vectors ||x|| 2 satisfy 

1 



lim 



log(l 







(12) 



n->oo (n — k) e 

for all e > 0. 

(e) The vector y is a random vector generated by (Q3 where 
A and w have i.i.d. Gaussian entries with zero mean and 
variance 1/m. 

Assumption [TJa) provides a range on the sparsity level k. 
As we will see below in Section [V] bounds on this range are 
necessary for proper selection of the threshold level /i > 0. 

Assumption Q2b) is the scaling law on the number of 
measurements that we will show is sufficient for asymptotic 
reliable recovery. In the special case when k is known so that 
kmux = fcmin = k, we obtain the simpler scaling law 



m > (1 + (5)2fclog(n - k). 



(13) 



We have contrasted this scaling law with the Tropp-Gilbert 
scaling law (O in Section U We will also compare it to the 
scaling law for lasso in Section [TV] 

Assumption [TJc) is critical and places constraints on the 
smallest component magnitude. The importance of the smallest 
component magnitude in the detection of the sparsity pattern 
was first recognized by Wainwright lfl"5l . ITT61 . [24]. Also, as 
discussed in f25l . the condition requires that signal-to-noise 
ratio (SNR) goes to infinity. Specifically, if we define the SNR 

EllAxll 2 



SNR = 



Ellwll 



then under Assumption Q~fe) it can be easily checked that 



Since x has k nonzero entries, ||x|| 2 > kx^ , and therefore 
condition (fTOb requires that SNR — > oo. For this reason, 
we will call our analysis of OMP a high-SNR analysis. The 
analysis of OMP with SNR that remains bounded above is an 
interesting open problem. 

Assumption (d) is technical and simply requires that the 
SNR does not grow too quickly with n. Note that even if 
SNR = 0{k a ) for any a > 0, Assumption Q2d) will be 
satisfied. 

Assumption \He) states that our analysis concerns large 
Gaussian measurement matrices A and Gaussian noise w. 
Our main result is as follows. 

Theorem 1: Under Assumption Q] there exists a sequence 
of threshold levels \i — /u(n) such that the OMP method in 
Algorithm Q] will asymptotically detect the correct sparsity 
pattern in that 



lim Pr ( Iomp 7^ h n. 

n—>oc \ 



o. 



Moreover, the threshold levels n can be selected simply as a 
function of k m i n , fc max , n, m and 5. 

Theorem Q] provides our main scaling law for OMP. The 
proof is given in Section IVIIII 



IV. Comparison to Lasso Performance 

It is useful to compare the scaling law ([T3l to the number 
of measurements required by the widely-used lasso method 
described for example in ||26) . The lasso method finds an 
estimate for the vector x in (Q~|i by solving the quadratic 
program 



arg mm 1 1 y - 

v6R" 



Av| 



MlMli, 



(15) 



SNR = 11x1 



(14) 



where jj, > is an algorithm parameter that trades off the 
prediction error with the sparsity of the solution. Lasso is 
sometimes referred to as basis pursuit denoising 11271 . While 
the optimization (fT3T > is convex, the running time of lasso is 
significantly longer than OMP unless A has some particular 
structure [10]. However, it is generally believed that lasso has 
superior performance. 

The best analysis of lasso for sparsity pattern recovery for 
large random matrices is due to Wainwright 1031 . Ifl6l . There, 
it is shown that with an i.i.d. Gaussian measurement matrix 
and white Gaussian noise, the condition (fT31 > is necessary 
for asymptotic reliable detection of the sparsity pattern. In 
addition, under the condition (TTOb on the minimum com- 
ponent magnitude, the scaling (13[ is also sufficient. We 
thus conclude that OMP requires an identical scaling in the 
number of measurements to lasso. Therefore, at least for spar- 
sity pattern recovery from measurements with large random 
Gaussian measurement matrices and high SNR, there is no 
additional performance improvement with the more complex 
lasso method over OMP. 

V. Threshold Selection and Stopping Conditions 

In many problems, the sparsity level k is not known a priori 
and must be detected as part of the estimation process. In OMP, 
the sparsity level of the estimate vector is precisely the number 
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of iterations conducted before the algorithm terminates. Thus, 
reliable sparsity level estimation requires a good stopping 
condition. 

When the measurements are noise-free and one is concerned 
only with exact signal recovery, the optimal stopping condition 
is simple: the algorithm should simply stop whenever there is 
no more error; that is, p*(t) = in (|6). However, with noise, 
selecting the correct stopping condition requires some care. 
The OMP method as described in Algorithm Q] uses a stopping 
condition based on testing if p*(t) > /i for some threshold p. 

One of the appealing features of Theorem Q] is that it pro- 
vides a simple sufficient condition under which this threshold 
mechanism will detect the correct sparsity level. Specifically, 
Theorem [T] provides a range k g [fcmini tai] under which 
there exists a threshold such that the OMP algorithm will 
terminate in the correct number of iterations. The larger the 
number of measurements to, the wider one can make the range 
[fcmin, fcmax]- The formula for the threshold level is given later 
in ( [Sgl , 

In practice, one may deliberately want to stop the OMP 
algorithm with fewer iterations than the "true" sparsity level. 
As the OMP method proceeds, the detection becomes less 
reliable and it is sometimes useful to stop the algorithm 
whenever there is a high chance of error. Stopping early may 
miss some small entries, but it may result in an overall better 
estimate by not introducing too many erroneous entries or 
entries with too much noise. However, since our analysis is 
only concerned with exact sparsity pattern recovery, we do not 
consider this type of stopping condition. 

VI. Numerical Simulations 

To verify the above analysis, we simulated the OMP al- 
gorithm with fixed signal dimension n = 100 and different 
sparsity levels k, numbers of measurements to, and randomly- 
generated vectors x. 

In the first experiment, x e R™ was generated with k 
randomly placed nonzero values, with all the nonzero entries 
having the same magnitude \xj \ = C for some C > 0. Follow- 
ing Assumption[TJe), the measurement matrix A <= R mxn and 
noise vector w g R m were generated with i.i.d. A/"(0, 1/m) 
entries. Using (TBI and the fact that x has k nonzero entries 
with power C 2 , the SNR is given by 

SNR = ||x|| 2 = fcC 2 , 

so the SNR can be controlled by varying C. 

Fig. 03 plots the probability that the OMP algorithm incor- 
rectly detected the sparsity pattern for different values of k 
and m. The probability is estimated with 1000 Monte Carlo 
simulations per (k, to) pair. For each k and to, the threshold 
level p was selected as the one with the lowest probability of 
error, assuming, of course, that the same /i is used across all 
1000 Monte Carlo runs. 

The solid curve in Fig. Q] is the theoretical number of 
measurements in (fT~3T > from Theorem [1] that guarantees exact 
sparsity recovery. The formula is theoretically valid as n —> oo 
and SNR — > oo. At finite problem sizes, the probability of error 
for to satisfying ( fT3b will be nonzero. However, Fig. Q] shows 
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Fig. 1. OMP performance prediction. The colored bars show the probability 
of sparsity pattern misdetection based on 1000 Monte Carlo simulations of 
the OMP algorithm. The signal dimension is fixed to n = 100 and the error 
probability is plotted against the number of measurements m and sparsity 
level k. The solid black curve shows the theoretical number of measurements 
m = 2fclog(n — k) sufficient for asymptotic reliable detection. 



that for the problem size in the simulation, the probability of 
error for OMP is indeed low for values of to greater than the 
theoretical level. When there is no noise (i.e. SNR = oo), the 
probability of error is between 3 and 5% for most values of k. 
When the SNR is 20 dB, the probability of error is between 
15 and 20%. In either case, the formula provides a reasonable 
prediction of the threshold in the number of measurements at 
which the OMP method succeeds. 

Theorem Q] is only a sufficient condition. It is possible that 
for some x, OMP could require a number of measurements 
less than predicted by (TT3T >. That is, the number of measure- 
ments ( foi l may not be necessary. 

To illustrate such a case, we consider vectors with a nonzero 
dynamic range of component magnitudes. Fig. [2] shows the 
probability of sparsity pattern detection as a function of to for 
vectors x with different dynamic ranges. Specifically, the k 
nonzero entries of x were chosen to have powers uniformly 
distributed in a range of 0, 10 and 20 dB. In this simulation, 
we used k — 20 and n — 100, so the sufficient condition 
predicted by (fT3l is to w 136. When the dynamic range is 
dB, all the nonzero entries have equal magnitude, and the 
probability of error at the value to = 136 is approximately 3%. 
However, with a dynamic range of 10 dB, the same probability 
of error can be achieved with to rj 105 measurements, a 
value significantly below the sufficient condition in (fl"3l >. With 
a dynamic range of 20 dB, the number of measurements 
decreases further to to ps 75. 

This possible benefit of dynamic range in OMP-like algo- 
rithms has been observed in [28], [29 1 and in sparse Bayesian 
learning lf30l . OTl . A valuable line of future research would 
be to see if this benefit can be quantified. That is, it would be 
useful to develop a sufficient condition tighter than ( TT3l that 
accounts for the dynamic range of the signals. 
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4) If t < k, set J t rue(i + 1) = ^truo(i) U {«*(*)}. Increment 
t = t + 1 and return to step 2. 

5) Otherwise stop. The final estimate of the sparsity pattern 

IS itruc(fc)- 

This "genie" algorithm is identical to the regular OMP 
method in Algorithm [T] except that it runs for precisely k 
iterations as opposed to using a threshold p for the stop- 
ping condition. Also, in the maximization in (TTTb , the genie 
algorithm searches over only the correct indices j € /true- 
Hence, this genie algorithm can never select an incorrect index 
3 hruc- Also, as in the regular OMP algorithm, the genie 
algorithm will never select the same vector twice for almost 
all vectors y. Therefore, after k iterations, the genie algorithm 
will have selected all the k indices in J true and terminate with 
correct sparsity pattern estimate 

/true(^) — /true 



Fig. 2. OMP performance and dynamic range. Plotted is the probability 
of sparsity pattern detection as a function of the number of measurements 
for random vectors x with various dynamic ranges. In all cases, n = 100, 
k = 20 and SNR = oo. 



VII. Conclusions and Future Work 

We have provided an improved scaling law on the number 
of measurements for asymptotic reliable sparsity pattern de- 
tection with OMP. Most importantly, the scaling law exactly 
matches the scaling needed by lasso under similar conditions. 

However, much about the performance of OMP is still not 
fully understood. Most importantly, our analysis is limited to 
high SNR. It would be interesting to see if reasonable sufficient 
conditions can be derived for finite SNR as well. Also, our 
analysis has been restricted to exact sparsity pattern recovery. 
However, in many problems, especially with noise, it is not 
necessary to detect every element in the sparsity pattern. It 
would be useful if partial support recovery results such as 
those in ||32)-|[34) can be obtained for OMP. 

VIII. Proof of Theorem[T] 

A. Proof Outline 

The main difficulty in analyzing OMP is the statistical 
dependencies between iterations in the OMP algorithm. Fol- 
lowing along the lines of the Tropp-Gilbert proof in ifTOl . we 
avoid these difficulties by considering the following alternate 
"genie" algorithm. A similar alternate algorithm is analyzed 
in [28] as well. 

1) Initialize t = and itrue(t) = 0- 

2) Compute P truc (t), the projection operator onto the or- 
thogonal complement of the span of {a,, i £ /true(^)}- 

3) For all j = 1, . . . , n, compute 



(hma(t,j) 



|a$Pt 



S (*M 



3(*)y|| 



and let 



btrueW) **(*)] = m r ax Ptrue0,j) 



(16) 



(17) 



with probability one. 

The reason to consider the sequences P trU e(i) and /true(i) 
instead of P(t) and I(t) is that the quantities P trU c(i) and 
Jtrue (i) depend only on the vector y and the columns a 3 for 
j € Jtrue- The vector y also only depends on a.j for j E 
/true and the noise vector w. Hence, Ptme(i) and itrue(i) 
are statistically independent of all the columns a.j, j £ /true- 
This property will be essential in bounding the "false alarm" 
probability to be defined shortly. 

Now, a simple induction argument shows that if 

min max p tmc (t,j) > p, (18a) 

t=0,...,fe— 1 ieitrue 

max max ptrue(t)j) < A*: (18b) 

t-0,...,k jgltrue 

then the regular OMP algorithm, Algorithm Q] will terminate in 
k iterations. Moreover, for all t, the OMP algorithm will output 

P(t) = Ptrue(i), t(t) - /true(*), and p(t, j) = p t rue(*, j) for 

all t and j. This will in turn result in the OMP algorithm 
detecting the correct sparsity pattern 

Jo MP = /true- 

So, we need to show that the two events in ( 1 1 Sat and (II 8bb 
occur with high probability. 

To this end, define the following two probabilities: 



Pmd 



Pfa 



= Pr | max min p tTue (t,j) < fj, 

, t = 0,...k-l je/truo 



= Pr 



max max p tme (t,j) > p. 

t=0,...fe j^itriie 



(19) 



(20) 



Both probabilities are implicitly functions of n. The first term, 
Pmd, can be interpreted as a "missed detection" probability, 
since it corresponds to the event that the maximum correlation 
energy ptrue(t,j) on the correct vectors j £ / truc falls below 
the threshold. We call the second term pfa the "false alarm" 
probability since it corresponds to the maximum energy on one 
of the "incorrect" indices j £ / truc exceeding the threshold. 
The above arguments show that 

Pr (/OMP 7^ /true) < PMD + PFA- 
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So we need to show that there exists a sequence of thresholds 
fi = jtx(n) > 0, such that pmb —> and pfa — > as 
h-^oo. We will define the threshold level in Section [VIII-BI 
Sections IVHI-CI and [VIII-DI then prove thatp M D -> with this 
threshold. The difficult part of the proof is to show j»fa — > 0. 
This part is proven in Section IVIII-GI after some preliminary 
results in Sections IVllLEl and lyTTLFl 

B. Threshold Selection 

We will first select the threshold sequence p,(n). Given 5 > 
in ©, let e > such that 



l + <5 



> 1 + e. 



1 + e 

Then, define the threshold level 

2(1 + e) 1 

(i = fj,(n) = log(n - fc min ). 

m 

Observe that since k > fe m i n , d22b implies that 

2(l + e) 
m 

Also, since k < fc max , (O, (OTt and (1221 show that 

1 



< 



(l + e)fc 



(21) 



(22) 



(23) 



(24) 



We begin by computing the limit of the norms of the 
measurement vector y and the projected noise vector w x . 
Lemma 1: The limits 



lim 



1 
lim 



1, 



1, 



hold almost surely and in probability. 

Proof: The vector w is Gaussian, zero mean and white 
with variance 1/m per entry. Therefore, its projection, w^, 
will also be white in the (m— k) -dimensional orthogonal com- 
plement of the range of $ with variance 1/m per dimension. 
Therefore, by the strong law of large numbers 



lim 



,_L 1 1 2 



lim ■ 

n— ¥oo 



1. 



where the last step follows from the fact that (0 implies that 

k/m — > 0. 

Similarly, it is easily verified that since A and w have i.i.d. 
Gaussian entries with variance 1/m, the vector y is also i.i.d. 
Gaussian with per-entry variance (||x|| 2 + l)/m. Again, the 
strong law of large numbers shows that 



lim 



n— >oo 1 



C. Decomposition Representation and Related Bounds 

To bound the missed detection probability, it is easiest 
to analyze the OMP algorithm in two separate subspaces: 
the span of the vectors {&j, j 6 /true}, and its orthogonal 
complement. This subsection defines some notation for this 
orthogonal decomposition and proves some simple bounds. 
The actual limit of the missed detection probability will then 
be evaluated in the next subsection, Section IVIII-DI 

Assume without loss of generality /true = {L 2, . . . , k}, 
so that the vector x is supported on the first k elements. Let 
$ be the m x k matrix formed by the k correct columns: 



$ = [ai, a 2 , . . . , a fe ] 



Also, let x true = [xi, x 2 , 
nonzero entries so that 



Xk 



Ax = $x tri 
Now rewrite the noise vector w as 



be the vector of the k 



(25) 



W = $V + W" 



where 



v = ($'$)" 1 $'w, w J 



(26) 



(27) 



The vectors <£>v and w 1 - are, respectively, the projections of 
the noise vector w onto the fc-dimensional range space of $ 
and its orthogonal complement. Combining ( |25l l with d26| i, we 
can rewrite (Q3 as 

y = $z + w ± , (28) 



where 



Xtr 



+ V. 



(29) 



We next need to compute the minimum singular value of 

$. 

Lemma 2: Let (T m in{&) and dmax^) be the minimum and 
maximum singular values of $, respectively. Then 

lim CT min ($) = lim <r max ($) = 1 

n— >oo n— »oo 

where the limits are in probability. 

Proof: Since the matrix $ has Af(0, 1/m) i.i.d. entries, 
the Marcenko-Pastur theorem [35] states that 



lim cr min ($) = lim 1 - y/k/r 



lim cr max ($) 

n— >-oo 



lim 1 + \Jk/m 



where the limits are in probability. The result now follows 
from (0 which implies that k/m — > as n — > oo. ■ 

We can also bound the singular values of submatrices of $. 
Given a subset / C {1, 2, . . . , k}, let $j be the submatrix of 
$ formed by the columns an for i E I. Also, let P/ be the 
projection onto the orthogonal complement of the span of the 
set {aj, i € /}. We have the following bound. 

Lemma 3: Let / and J be any two disjoint subsets of 
indices such that 



Then, 



I Li J = {1, 2, k}. 



Proof: The matrix S = [$/ $j] is identical to $ except 
that the columns may be permuted. In particular, <r m i n (S) = 
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Cmin(^ > )- Therefore, 

S'S = 



> oi 



,(S)/ 



> 



TL1I1 V ^ ^ 


^ n (*)J 



The Schur complement (see, for example ll36l ) now shows that 
or equivalently, 

The result now follows from the fact that 

Pi = 7- $ 7 ($;$/)" 1 $' / . 

■ 

We also need the following tail bound on chi-squared 
random variables. 

Lemma 4: Suppose Xi, i = 1, 2, . . ., is a sequence of real- 
valued, scalar Gaussian random variables with Xj ~ A/"(0, 1). 
The variables need not be independent. Let Mk be the maxi- 
mum 



Mi, 



max IXj 
i=i,...,fe 



Then 



lim sup — — — - 

fe^oo 21og(fc) 



< 1, 



where the limit is in probability. 

Proof: See for example ||281 . ■ 
This bound permits us to bound the minimum component 
of z. 

Lemma 5: Let z m i n be the minimum component value 



Then 



lim inf min > 1 , 



(30) 



n—>oo x m [ n 

where the limit is in probability and x m i n is defined in (Hit . 

Proof: Since w is zero mean and Gaussian, so is v as 
defined in ( f27T >. Also, the covariance of v is bounded above 
by 



E [vv'] 



(a) 



($'$)-!$' (E [W]) 



m 

(c) 1 

< -^(*), 

where (a) follows from the definition of v in ( l27| >; (b) follows 
from the assumption that E[ww'] = (l/m)7 m ; and (c) is a 
basic property of singular values. This implies that for every 
2, ...,k}, 

em 2 < 



Applying Lemma [4] shows that 



where 



Therefore, 



limsup^%%M<l, 
fc^oo 21og(fc) 



»max= max \Vi\. 



(31) 



lim 



lim . ,11 

n->oo Y21og(fc) / \ ma: 



21og(fc) 



r 2 . 
'mm 



<£> Hm f "<aA.($ A ^log(fc) 

— n-K» y 2k>g(fc) 

0) , 21og(fc) 

< lim |M 

( c ) , 2\og(n-k) 

< lim ^ '- 



(d) _ 1 

< lim 



n->oc (1 + S)kx 
0, 



2 

min 



where all the limits are in probability and (a) follows from 
Lemma |2j (b) follows from ( t3Tb ; (c) follows from the fact 
that k < n/2 and hence k < n — k; (d) follows from (O; and 
(e) follows from C[0]l. Now, for j € {1, 2, . . . , fc}, 



kil = > Nil - \ v il 



and therefore, 



Hence, 



> 1 - 



where again the limit is in probability. ■ 

D. Probability of Missed Detection 

With the bounds in the previous section, we can now show 
that the probability of missed detection goes to zero. The proof 
is similar to Tropp and Gilbert's proof in iTTO) with some 
modifications to account for the noise. 

For any t G {0, 1, fc}, let J(t) = 7 truc n 7 truc (i) c , 
which is the set of indices j € 7 truc that are not yet detected 
in iteration t of the genie algorithm in Section IVIII-AI Then 



$Z = $ / tl uc(t) Z / tr „c(t) + $J(t) Z J(t)> 



(32) 



where (using the notation of the previous subsection), 
denotes the submatrix of $ formed by the columns with 
indices i G 7, and zj denotes the corresponding subvector. 

Now since Ptrue(i) is the projection onto the orthogonal 
complement of the span of {aj, i € 7t rue (t)}, 



Ptruc(i)$/ truo(t ) = 0. 



(33) 



Also, since w is orthogonal to for all i E /true and 

^true(^) — -^truej 

Ptruc(t)w ± = W X . (34) 
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Therefore, 



PtrueWy = Ptruo(*)(*Z + W- 



(b) 



Ptruo (t) (* J( t ) Z./(t) +W X ) 
PtrucW*J( t )Z./(t) +W ± , (35) 



where (a) follows from (|28> : (b) follows from (132t and 
and (c) follows from ( [34-b . 

Now using (|34] l and the fact that w x is orthogonal to 
for all i g /true, we have 



a; ; P truc (t)w J - = a£w x = 



(36) 



for all i £ /true- Since the columns of $.j(t) are formed by 
vectors a^ with i £ /true, 

^Pte^w 1 = 0. (37) 

Combining (O and d35l l. 

||Ptrue(t)y|| 2 = ||P tr uc(i)<i>J(t)Z J(4) || 2 + llw^H 2 . (38) 

Now for all t, we have that 



max pti Ue (t,j) 

J Citrus 

(a) 1 



|Ptruc(%|| 2 jeh 



max |a'P true (j)y| 



(i>) 



(c) 



|Pt r ue(t)y|P^w |a ^ Ptrue(i)y| ' 
|pJ(t)y||» ll *' J W Ptn - (, ' )y|1 - 



(d) 1 / 2 

- |J(t)|||Ptn 1 c(t)y|l 2ll * J(t)Ptnic(j)y|12 

(e) ll $ J( t )Ptruo(i)$,/( t )Z,7( t )lli 



(/) 



|J(i)|||P truo (t)y|| 2 

||$j m Ptruc(i)4 J(t) Z J(t) || 2 



|J(t)|(||P true (t)$ J(i) z J(t) |P + ||w^|P) 



(s) CTmin (*' J(t ) Ptrue (j)$J(t) ) IN J( t ) ||| 



> 



(ft) 
> 



(0 
> 



l^)lKa X (*)l|z,7( t )|| 2 +||w^|P) 

^in(4)||z J(t )|| 2 
l^)lKa X (*)l|z JW || 2 + ||w^||2) 



^min(*)^ 



(39) 



where (a) follows from the definition of p trVLe (t,j) in (TToT ); (b) 
follows from the fact that Pt rue (t)aj = for all j £ itrue(i) 
and hence the maximum will occur on the set j £ /true H 
/true(^) c = ^(*)> ( c ) follows from the fact that 3>j(t) is the 
matrix of the columns &j with j € (d) follows the bound 
that ||v||^ < dllvll 2 ^ for any v € M d ; (e) follows @5} and ( f3Tb : 
(f) follows from ([38j; (g) follows from the fact that P true (t) 
is a projection operator and hence, 

<7max (Ptrue (*)$J(t)) < ($7(4)) < CTma X ($); 

(h) follows from Lemma [3j and (i) follows from the bound 



ij {t) \\ 2 >\J{t)\z, 



i 

min 



and \J(t)\ < k. Therefore, 



1 



liminf min max —ptrue(t,j) 

n^<x t=0,...,fc-l jGJtnie (1 

1 a 4 ■ ($)z 2 ■ 

mini / min 



(a) 
> 



lim inf ■ 



(6) 
> 

(c) 



1 Z 2 ■ 

> liminf- , 9 mln 

n^oo /jfc< in + l 



1 T 2 



> lim inf 



> 



lim inf - — 



(e) 

> 1 



(40) 



where (a) follows from ((39), (b) follows from Lemmas Q~|and|2] 
(c) follows from Lemma |5J (d) follows from the assumption 
of the theorem that A:x 2 lin — > oo; and (e) follows from ( f24b . 
The definition of pmd in ( fT9l now shows that 

lim pmd = 0. 



E. Bounds on Normalized Brownian Motions 

Let B(t) be a standard Brownian motion. Define the nor- 
malized Brownian motion S(t) as the process 

1 



S(t) = -j=B(t), t > 0. 
We call the process normalized since 



(41) 



E|S(t)f 



1. 



We first characterize the autocorrelation of this process. 

Lemma 6: If t > s, the normalized Brownian motion has 
autocorrelation 

E[S(t)S(«)] = yflft. 

Proof: Write 

S(t) = -L(B(s) + B(t) - B(s)). 

Thus, 

E[S(t)S(s)] = -±=E[(B( s ) + (B(t)-B(s))B(s)} 

(a) 
(b) 



st 

where (a) follows from the orthogonal increments property of 
Brownian motions; and (b) follows from the fact that B(s) ~ 
Jf{0,s). M 

We now need the following standard Gaussian tail bound. 

Lemma 7: Suppose X is a real-valued, scalar Gaussian 
random variable, X ~ Af(0, 1). Then, 

Pr (X 2 > u) < — L- exp(-u/2). 

Proof: See for example ll37l . ■ 
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We next provide a simple bound on the maximum of sample 
paths of S(t). 

Lemma 8: For any < a < b, let 

S maK (a,b) = sup \S(t)\. 

te[a,b] 



Then, for any fi > 0, 

Pr(^ a >,6)>,)<-^=ex P (^) 
Proof: Since S(t) and S(—t) are identically distributed, 

Pr (SLxK b) > /i) < 2 Pr f sup 5(t) > y/Jl) • (42) 

\te[o,6] / 

So, it will suffice to bound the probability of the single-sided 
event sup S(t) > y/Jl. For t > 0, define B a (t) = B(a + 1) - 
B(a). Then, B a (t) is a standard Brownian motion independent 
of B(a). Also, 

sup S(t) > sfji 

te[a,b] 

=> sup —pB(t) > sfji 

t£[a,b\ yt 

=>■ sup B(t) > yfaji 
te[o,6] 

=4- B(o) + sup B a (t) > y/aJL, 
te[o,6-o] 

Now, the reflection principle (see, for example ll38l ) states that 
for any y, 

Pr [ max B„(f) > y ) = 2Pr f \/b - aY > y] , 

\te[0,6-a] /V / 

where Y is a unit-variance, zero-mean Gaussian. Also, B(a) ~ 
AT(0,o), so if we define AT = (l/y/a)B(a), then A ~ 
A/"(0, 1). Since B(a) is independent of !?„(*) for all t > 0, 
we can write 



Pr sup S(t) > 
Vfe[o,6] / 



< 2Pr [y^X + Vb-aY > y/apj , (43) 

where A and Y are independent zero mean Gaussian random 
variables with unit variance. Now y/aX + \Jb — aY has 
variance 



E 



[y/aX + y/b - aYf 



a + b — a = b. 



Applying Lemma [7] shows that d43l can be bounded by 



Pr sup S(t) > y/Ji < = 

\te[o,b] / a^-n 



exp 



ajry 
2b J 



Substituting this bound in ( 1421 ) proves the lemma. ■ 
Our next lemma improves the bound for large /i. 
Lemma 9: There exist constants C\, C2, and C3 such that 

for any < a < b and // > C3, 

Pr (^(a, 6) > ^ < (Ci + C 2 log (6/0)) e^/ 2 . 



Proof: Fix any integer n > 0, and define t$ = a(b/a) l ^ n 
for i = 0, 1, . . . , n. Observe that ^s partition the interval 
[a, b] in that 

a = to < t\ < ■ ■ ■ < t n = b. 

Also, let r = b/a. Then, U+x/U = {tya) 1/n = r 1 /™. Applying 
Lemma |8] to each interval in the partition, 

Pr(5 2 lax («,6)>M) 

n-X 

< E Pr ( 5 maxfe,^+l) >A<) 



< 



i=l 



■ exp 



Ati/^ r V 2 
Now, let J > 0, and for /i > 8, let 

log(r) 



Then 
and hence 

exp 

Also, d45T l implies that 

n < 1 - 



log(l - 5/m) 
V" > 1 - £/ M , 



,-1/r. 



log(l - «*//*) " + J gU ' 



(44) 

(45) 
(46) 

(47) 
(48) 



where we have used the fact that log(l — x) < —x for x > 0. 
Combining the bounds ( |46| i and d48l > yields 



TIT 



X/n 



<(l + flog(r))-l 



(49) 



Ai v / (i — 5 

Now, pick any (5 > and let C* 3 = 26. Then if /i > C 3 = 2<5, 
(|49l implies that 

nr X/n 1 

< (l + 21og(r)). 
Ai~ 

Substituting (07]i and d50]> into (|44) shows that 

Pr(5 max (a, 6) > /i) < (d + C 2 bg(r)) e"^ 1 / 2 , 

where 



(50) 



C*2 



2e 5 / 2 



The result now follows from the fact that r — b/a. ■ 

F. Bounds on Sequences of Projections 

We can now apply the results in the previous subsection to 
bound the norms of sequences of projections. Let y £ K m 
be any deterministic vector, and let P(i), i — 0, 1, . . . , k be 
a deterministic sequence of orthogonal projection operators 
on W n . Assume that the sequence P(z) is decreasing in that 
P(i)P(j) = P(«) for j > i. 

Lemma 10: Let a £ M. m be a Gaussian random vector with 
unit variance, and define the random variable 

|a'P«y| 2 



M 



t=o,...,k ||P(*)y|| 2 
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Then there exist constants C\, C 2 , and C3 > (all indepen- 
dent of the problem parameters) such that /i > C3 implies 

Pr(M > p) < (d + C 2 log(r)) e^ 2 , 

where r = 1 1 F» ( 1 )y 1 1 2 / 1 1 F»(rz)y 1 1 2 . 
Proof: Define 

y'P(i)a 



l|P(i)y| 



so that 



M = max \zi 
»=0,...,fc 



Since each is the inner product of the Gaussian vector a with 
a fixed vector, the scalars {z^ i = 0, 1, . . . , k} are jointly 
Gaussian. Since a has mean zero, so do the ZjS. 

To compute the cross-correlations, suppose that j > i. Then 

1 



E [z{Zj 



l|P(<)y| 


ll|P(i)y|| 


(a) 


1 


l|P(<)y| 


ll|P(i)y|| 


(6) 


1 


l|P(0y| 


ll|P(j)y|| 


l|P(<)y| 




l|P(j)y 





E [y'P(i)aa'P(j)y] 
y'P(z)P(j)y 

y'P«y 



where (a) uses the fact that E[aa'] = I m ; and (b) uses the 
descending property that P(i)P(j) = P(i). Therefore, if we 
let ti — ||P(i)y|| 2 , we have the cross-correlations 



E [ziZj] = \Ju/tj 



(51) 



for all j > i. Also observe that since the projection operators 
are decreasing, so are the tjS. That is, for j > i, 

U = ||P«y|| 2 ( = } ||P( 4 )P0-)y|| 2 < ||P(j)yf =t j: 

where again (a) uses the decreasing property; and (b) uses 
the fact that P(i) is a projection operator and norm non- 
increasing. 

Now let S(t) be the normalized Brownian motion in (|4TT >. 
Lemma [6] and ( l5TT l show that the Gaussian vector 

z = (zo, zi, z k ) 

has the same covariance as the vector of samples of S(t), 

s=(S(t Q ), S(h), ...,S(t k )). 

Since they are also both zero-mean and Gaussian, they have 
the same distribution. Hence, for all p, 



Pr(M > p) = Pr max \zi\ 2 > p 

\ i— 0,...,/c 



Pr max \S(U)\ 2 > p 

\i—0,...,k , 



< Pr sup \S(t)\ 2 > ,i , 
\te[t k ,t ] J 

where the last step follows from the fact that the tjS are 
decreasing and hence tk > U > t for all i G {0, 1, . . . , k}. 
The result now follows from Lemma [9] ■ 



G. Probability of False Alarm 

Recall that all the projection operators P trU e(i) and the 
vector y are statistically independent of the vectors a 3 for 
j & hruc- Since the entries of the matrix A are i.i.d. Gaussian 
with zero mean and variance 1/m, the vector m&j is Gaussian 
with unit variance. Hence, Lemma [10] shows that there exist 
constants C%, C%, and C3 such that for any A > C3, 

Pr ( max TO la,Pt.ruc(^)y| 2 > ^ < Be _ A/2 

V' » * ||Ptrue(*)y|| 2 " 

where j ^ /true and 

B = C x + C 2 log 
Therefore, 

(a) 



|Ptruc(0)y|| 

|Pt™c(fc)y|| 



(53) 



Pfa = Pr max max p tIUC (t, j) > p 

\t=l,...,kjgI tTue 

(b) ( 

< (n — k) max Pr max ptrue(t,j) > p 

jWtruc \t = l,...,k 

( c ) / M d Z' |a J P truo (<)y| 2 

= (n — k) max Pr max -r-^- — s — 7-7- > p 

1 jWtrue ^t=l,...,fc ||PtrueWy|| 2 

(<*) 

< (n - k)Be- rn ^ /2 

< (n-fc)Be- (1+e)log(n " fc) 
1 -B, 



(„-*).- (54) 

where (a) follows from the definition of j»fa m d20b ; (b) uses 
the union bound and the fact that 7 t c ruc has n—k elements; (c) 
follows from the definition of Ptrue(t,j) m (O; (d) follows 
from d52l i under the condition that pm > C3; and (e) follows 
from d23l . By (O and the hypothesis of the theorem that n — 
k — > 00, 



pm = (1 + <5)21og(n - k) 



00 as n — > 00. 



Therefore, for sufficiently large n, pm > C3 and d54l > holds. 
Now, since itrue(O) = 0, Ptruo(O) = / and therefore 



Ptrue(0)y = y. 



(55) 



Also, Itruc(fc) = ^true and so Ptrue(fc) is the projection 
onto the orthogonal complement of the range of <£>. Hence 
Ptrue(fc)^ = 0. Combining this fact with ( f28l l and (134-b shows 



Ptruo(fc)y = W J 



(56) 



Therefore, 






lim inf ppA 






(a) 
< 


lim inf 

n— >oo 




1 


(n 


- ky 


(6) 
< 


lim inf 




1 


(n 


- ky 


(£) 


lim inf 

n— ►oo 




1 




(n 




(rf) 


lim inf 

n— ¥00 




1 




(n 


- ky 


(f) 










B 



Ci + C 2 log 
d + C7 2 log 



|Ptruc(0)y|| 2 
|Ptrue(fc)y|| 2 

llyll 2 



w 



XII2 



(d+Caloga + W 2 )) 
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where (a) follows from (l54t ; (b) follows from (153) ; (c) follows 
from (|55l l and (|56T >; (d) follows from Lemma[T] and (e) follows 
from (1121 . This completes the proof of the theorem. 
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